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ABSTRACT 


THE  ORDNANCE  CENTER  AND  SCHOOL  USES  A  SIMULATION  MODEL  TO 
ANALYZE  RECOVERY  AND  MAINTENANCE  OPERATIONS  ON  THE  FUTURE 
BATTLEFIELD.  IT  RUNS  ON  A  PERSONAL  COMPUTER  AND  IS  PROGRAMMED 
WITH  COMMERCIAL  SLAMSYSTEM  SOFTWARE.  IT  SIMULATES  AN  EIGHT 
HOUR  ARMORED  BRIGADE  BATTLE  IN  A  EUROPEAN  SCENARIO.  THE  MODEL 
IS  USED  TO  EVALUATE  THE  PROBABLE  IMPACT  OF  IMPROVED  RECOVERY 
VEHICLES  AND  MAINTENANCE  VEHICLES  ON  AVERAGE  REPAIR  TIME 
NEEDED,  RECOVERY  TIME  REQUIRED,  AND  OTHER  PARAMETERS  OF 
INTEREST.  IT  IS  USEFUL  FOR  ANSWERING  TYPICAL  "WHAT  IF" 
QUESTIONS. 

AFTER  COMPLETING  A  RUN,  THE  MODEL  PROVIDES  DATA  SUCH  AS  THE 
NUMBER  OF  TANKS  AVAILABLE  AT  THE  END  OF  THE  BATTLE  AND  AT  THE 
END  OF  THE  DAY.  THIS  IS  COUNT  DATA.  THE  OBSERVED  COUNTS  FALL 
INTO  JUST  TWO  CATEGORIES,  "OPERATIONAL"  OR  "NOT  OPERATIONAL". 
WHEN  THIS  OCCURS,  THE  DATA  ARE  CALLED  BINOMIAL  DATA.  THE 
INVESTIGATOR'S  INTEREST  IS  IN  PROPORTIONS  -  THE  PERCENTAGE  OR 
NUMBER  OF  EVENTS  IN  ONE  OF  THE  TWO  CLASSES.  STATISTICAL 
METHODS  ARE  NEEDED  TO  ESTABLISH  CONFIDENCE  LIMITS  ON  THE 
PROPORTIONS  OBSERVED,  AND  TO  DEMONSTRATE  SIGNIFICANT 
DIFFERENCES. 

MATHEMATICALLY  EXACT  METHODS  FOR  ANALYZING  BINOMIAL  DATA 
EXIST.  HOWEVER,  THE  NECESSARY  COMPUTATIONS  ARE  EXTREMELY 
DEMANDING  AND  TIME  CONSUMING.  THE  USE  OF  PUBLISHED  BINOMIAL 
TABLES  PRESENTS  PRACTICAL  DIFFICULTIES  THAT  MAY  LEAD  TO 
INACCURACIES  IN  THE  FINAL  RESULTS.  EXISTING  "SHORT  CUT" 
APPROXIMATION  TESTS  ARE  FREQUENTLY  USED.  THESE  TESTS  OFTEN 
GIVE  GOOD  RESULTS,  BUT  OCCASIONALLY  THEY  YIELD  BAD  RESULTS. 

THE  USER  NEEDS  TO  EXERCISE  CARE  IN  ORDER  TO  DETECT  THE 
OCCURRENCE  OF  UNACCEPTABLE  RESULTS. 

OUR  APPROACH  TO  BINOMIAL  DATA  ANALYSIS  IS  THE  TOPIC  OF  THIS 
REPORT. 
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BINOMIAL  ANALYSIS  OF  RECOVERY  AND  MAINTENANCE 
SIMULATION  RESLTTS  -  AIRLAND  EATTLE-HTIUPE 

MR.  ROBERT  G.  DICK  AND  MR.  BRYAN  A.  BRICE 
U.  S.  ARMY  CS^CNANCE  CENTER  AND  SCHOOL 

UNLIMITED  DISIRIBUnON/PUBLIC  RELEASE 


INTRODUCTION 

The  Ordnance  Center  and  School  uses  a  simulation  model  to  analyze  recovery 
and  naintenance  c^jerations  an  the  fxiture  battlefield.  It  runs  cmi  a  perscral 
corputer  and  is  progranmed  with  cocmercial  SLAMSYSTEM  software.  It  simulates 
an  ei^t  hour  armored  brigade  battle  in  a  European  scenario. 

One  use  of  the  model  is  to  evaluate  the  probable  impact  of  improved  recovery 
vehicles  and  maintenance  vehicles.  We  can  make  different  assunpticxTS  about 
the  average  r^iair  time  needed,  recovery  time  required,  and  other  key 
parameters.  After  ccnpleting  a  run,  the  model  provides  data  such  as  tank 
availability  at  the  end  of  battle  and  at  the  end  of  the  day,  the  number  of 
tanks  evacuated,  and  the  number  not  recovered.  We  use  the  model  to  answer 
typical  '*what  if”  questions. 

In  order  to  generalize  from  this  brigade  size  model,  the  battle  siraulaticxi  is 
r^jeated  with  different  randan  number  seeds  to  produce  different  results  that 
might  occur  due  to  the  laws  of  chance.  Statistical  methods  are  used  to 
establish  confidence  limits  and  demcnstrate  significant  differences. 

THE  STATISnCAL  PROBLEMS 

We  needed  typical  statistical  capabilities  for  analyzing  quantitative  data. 
Modest  PC  capabilities  were  available,  and  were  subsequently  enhanced  throtx^ 
the  acquisition  of  commercial  software.  We  also  needed  to  analyze  a 
significant  volume  of  count  data.  Cur  observed  counts  usually  fall  into  just 
two  categories,  e.g.,  "cperational"  or  "not  cperational" ,  "hit"  or  "miss", 
"yes"  or  "no",  etc.  Our  interest  was  in  preportions  -  the  percentage  or 
number  of  events  in  one  of  the  two  classes.  For  example,  a  simulation  run 
might  show  that  119  of  the  original  164  tanks  are  still  cperational  edter  the 
battle.  This  is  binonial  data.  CXur  approach  to  binomial  data  analysis  is 
the  topic  of  this  paper. 

THE  APPROACH 

Several  options  were  available.  We  could  treat  our  ccxint  data  as  if  it  were 
measuremvent  data,  cind  use  statistical  methexis  cxmmonly  applied  to  cxxrtinucxis 
data.  We  ccxild  use  sane  of  the  "short  exit"  approximation  tests  that  aure 
available  for  count  data.  We  cxxild  use  binomial  methods  and  rely  on 
published  binomial  tables.  We  exuld  obtain  cxirntnercial  software  designed  to 
handle  binomial  data.  All  of  these  cptions  have  shorteomings  that  will  be 
discussed. 


The  c^ticxi  that  we  selected  was  to  create  our  own  binonial  softwcire  for  the 
PC.  This  softwzire  is  fully  cperaticnal.  It  consists  of  two  prograins  written 
in  FCKTRAN,  a  language  well  equipped  to  handle  the  laborious  calculatiOTs. 

The  first  program  creates  a  table  of  the  cumulative  bincmial  when  the  user 
specifies  a  prcAjability  and  any  number  of  trials.  The  seccxid  program  reads 
binojiial  sanple  data  entered  by  the  user.  It  ccnputes  the  proporticai 
observed,  auid  several  sets  of  ocxifidence  limits  for  this  proportion.  It 
enables  the  user  to  test  for  statistically  significant  differences  at  various 
significance  levels.  This  software  is  available  at  no  cost  to  the  D^jartment 
of  Defense  ocnminity. 

Further  discussiOT  will  include  some  background  c»i  the  binonial  distributiOT, 
the  capabilities  of  the  programs,  some  sanple  problems,  hardware  and  software 
requirements,  and  a  few  interesting  aspects  of  the  binomial  that  may  not  be 
obvious. 

SCME  BACRCSCXM)  ON  THE  BINCMIAL 

The  bixxxu.al  distribution  provides  a  method  for  handling  count  data  for 
populations  vAiere  the  c^Dservations  fall  into  just  two  categories.  The 
investigator's  interest  is  in  preportions  -  the  percentage  or  number  of 
events  in  one  of  the  two  classes.  Some  of  the  statistical  literature  refers 
to  this  type  of  data  as  "quantal"  data,  or,  "dichotcroous"  data,  or 
"2dl-or-none''  data.  A  two-class  population  has  a  very  siirple  structure.  It 
can  be  described  by  giving  the  proportion  of  the  members  of  the  population 
that  fall  in  caie  class.  In  a  randan  saitple  of  size  N,  the  probability  of 
getting  exactly  0,  1,  2,  3,  ...,  N  successes  can  be  worked  out. 

By  definiticHn,  the  binonial  distribution  is  the  probability  distribution  of 
the  possible  number  of  times  that  a  particular  event  will  occur  in  a  sequence 
of  trials.  If  the  event  has  a  given  probability  of  occurring  during  any  cxie 
trial,  the  binonial  distribution  states  the  probability  of  the  event 
occurring  any  certain  exact  number  of  times  in  a  sequence  of  trials.  This  is 
stated  mathematically  as,  "The  probability  of  the  event  occurring  R  times  in 
N  trials,  or  P(R) .  The  program  named  BNIABLES  generates  tables  that  supply 
this  information. 

PROGRAM  BOTARTFS 

We  will  introduce  this  program  with  a  saitple  problem.  If  you  flip  an  honest 
coin  six  times: 


1.  Hew  many  times  would  you  expect  to  observe  heads  as  the  outcome? 
(Obviously  the  answer  is  three.) 

2.  What  is  the  probability  of  observing  heads  exactly  three  times? 

3.  What  is  the  probability  of  observing  heads  exactly  one  tine? 

4.  What  is  the  probability  of  c^serving  heads  exactly  five  times? 
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5.  What  is  the  probability  of  observing  heads  more  than  four  times? 

The  user  enters  a  prdaability,  P.  (This  may  be  known,  or  assumed,  d^aending 
upon  hew  the  the  ^Jecific  prcdolem  is  struc±ured.)  He  also  enters  the  number 
of  trials,  N,  (the  saitple  size) .  CXitput  can  be  directed  to  either  the  screen 
or  the  printer.  In  our  example,  the  user  would  enter  P  =  0.5  and  N  =  6,  and 
would  obtain  the  following  output: 


OMJLftTIVE  BINOMIAL  TABLE 


P  =  .50000 

N  =  6 

R 

P(R) 

SIOIA  P(R) 

1  -  SIGMA 

P(R  OR  LESS) 

P(MC»E  THAN  I 

0 

.015625000 

.015625000 

.984375000 

1 

.093750000 

.109375000 

.890625000 

2 

.234375000 

.343750000 

.656250000 

3 

.312500000 

.656250000 

.343750000 

4 

.234375000 

.890625000 

. 109375000 

5 

.093750000 

.984375000 

.015625000 

6 

.015625000 

1.000000000 

.000000000 

The  first  column  of  the  table  contedns  all  possible  values  for  R  (the  number 
of  heads  that  cculd  occur) .  The  second  column  contains  P(R) ,  the  prrbability 
that  the  event  (heads)  will  occur  exactly  R  times  in  the  N  trials  (the  total 
number  of  coin  flips) . 

What  is  usually  of  more  interest  and  more  practical  value  is  the  probability 
of  an  event  occurring  less  than  or  more  than  some  specified  number  of  times. 
Accordingly,  the  third  column  contains  the  CUMUIATIVE  probability  of  the 
event  occurring  R  times  or  less.  The  column  heading  is  SIOIA  P(R) ,  since  any 
number  in  the  column  is  the  sum  of  all  of  the  P(R)  values  vp  to  and  including 
that  row.  The  fourth  column  COTtains  the  probability  of  the  event  occurring 
more  than  R  times.  Since  this  is  the  conplement  of  the  third  column,  the 
heading  is  1  -  SIGMA. 

Lock  at  the  second  column  of  the  table.  It  contains  the  probabilities  of 
observing  any  exact  number  of  heads  in  6  trials.  The  largest  probability 
appears  in  the  rew  where  R  is  3  (three  heads).  It  is  0.3125.  You  have 
answered  questions  #1  and  #2.  Look  at  the  rew  where  R  is  1.  The  preiaability 
of  observing  heads  exactly  one  time  is  0.09375.  You  have  answered  question 
#3.  Lock  at  the  row  where  R  is  5.  The  probability  of  observing  heads 
exactly  five  times  is  also  0.09375.  You  have  answered  questicn  #4.  Lock  at 
the  row  where  R  is  4.  The  pretoability  of  observing  more  than  four  heads 
appears  in  the  right  column.  It  is  0.109375.  This  corpletes  the  sample 
prc±>lem. 
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Take  another  lode  at  the  probabilities  in  column  P(R)  of  the  table.  Note 
that  if  you  viere  to  plot  the  binciiiial  distribution  define  try  these 
probabilities,  the  resulting  curve  would  be  symmetric  (Chare  #1) .  It  is 
interesting  to  note  that  the  bincmial  distributicHi  is  synmetric  cxilv  when 
P  =  0.5.  As  P  moves  away  fron  0.5,  the  bincmial  becemes  more  and  more  skewed 
toward  the  center  of  the  plot.  Hewever,  if  P  is  held  ccMTstant  at  any  value 
(even  a  value  far  removed  from  0.5) ,  the  binomial  becemes  more  synmetric  as  N 
increases.  These  fact  will  subsequently  be  examined  in  more  detail,  using 
Charts  #2,  #3,  and  #4. 

ANOIHIR  EX?*4PLE 

Let's  look  at  another  small  problem  that  is  more  practical.  Assume  that  the 
Array  hcis  a  ccxitract  with  a  manufacturer  of  spare  peirts  for  vheeled  vehicles. 
Contract  terms  for  aie  specific  peurt  state  that  the  Array  will  accept  the 
occurrence  of  random  defects  with  probability  P  =  0.002  (an  average  of  two 
defect  per  1000  parts) .  Assume  that  the  manufacturer  is  able  to  conform  to 
these  terms.  An  array  unit  orders  500  parts.  You  have  been  adeed  to 
determine  several  probabilities  associated  with  the  shipment. 

What  is  the  prdoability  of  no  defects  in  the  order?  The  probability  of 
exactly  one  defect?  The  probability  of  two  defects  or  less?  The  probability 
of  four  or  more  defects? 

You  would  again  run  program  BNTABLES,  specifying  P  =  .002  and  N  =  500,  and 
would  obtain  the  following  output; 


CUMUIATIVE  BINOMIAL  TABLE 


P  =  .00200 

N  =  500 

R 

P(R) 

SIGMA  P(R) 

1  -  SIGMA 

P(R  OR  LESS) 

P(M0RE  THAN  R) 

0 

.367511255 

.367511255 

.632488745 

1 

.368247750 

.735759005 

.264240995 

2 

.184123875 

.919882880 

.080117120 

3 

.061251630 

.981134510 

.018865490 

4 

.015251533 

.996386043 

.003613957 

5 

.003031'=  68 

.999418011 

.000581989 

6 

.000501277 

.999919289 

.000080711 

7 

.000070893 

.999990182 

.000009818 

8 

. 000008755 

.999998937 

.000001063 

9 

.000000959 

.999999896 

.000000104 

Look  over  the  table.  The  first  column,  with  heading  R,  contains  values  for 
possible  numbers  of  defects  in  the  sample  of  500.  The  seccxid  column  COTitains 
the  probability  of  observing  exactly  R  defects  in  500  trials.  The  third 
column  contains  the  cumulative  probability  of  finding  R  or  fewer  defects. 


4 


Ttke  fourth  column  COTtains  the  cunulative  probability  of  finding  more  than  R 
defects.  All  of  your  answers  can  be  read  directly  fran  the  table: 


The  probability  of  zero  defects  =  0.3675  (1st  figure  in  column  2) . 

The  probability  of  caie  defect  =  0.3682  (2nd  figure  in  column  2) . 

The  probability  of  two  or  less  =  0.9199  {3rd  figure  in  column  3) . 

Prc±ability  of  four  or  more  =  0.0189  (4th  figure  in  column  4) . 

This  is  the  same  as  the  probability  of  "more  than  three",  which 
can  be  read  directly  frcm  the  table. 

PROGRAM  BNOCMTJ^ 

Suppose  that  instead  of  dealing  with  a  binondal  probability  that  is  ]cnown  or 
assumed,  you  have  no  information  exc^xt  for  a  randcm  sanple  frcan  a  binomial 
populaticxi.  You  have  no  idea  v^hat  the  pc^xilation  probability  (or  prcportican) 
mi^t  be.  You  can  easily  and  quickly  estimate  this  probability  frcm  the 
sanple  by  conputing  the  saitple  preportion: 

P  =  Number  of  Successes  /  Total  Number  of  Trials. 

You  would  also  like  to  ctotain  ccxifidence  limits  for  this  proportion  to  get  an 
idea  of  how  good  your  estimate  mi^t  be.  An  eincilogue  to  this  would  be 
cceputing  ccxifidence  limits  for  a  sanple  average  obtained  frcm  a  pcpulaticm 
with  continuous  data,  vhen  that  populaticai  is  knewn  or  assumed  to  be  normally 
distributed.  These  ooitputaticxis  are  relatively  easy.  Hewever,  no  simple 
method  for  direct  oonputaticai  of  bincmial  confidence  limits  is  available. 

You  mi<^t  also  wish  to  go  a  st^  farther  and  run  a  significance  test,  to 
infer  whether  the  true  preportion  of  your  population  is  different  frcm  that 
of  another  population,  or  different  frcm  seme  "specified  value"  or 
"theoretical  standard".  If  you  have  ocmfidence  limits  available,  ycu  can 
easily  determine  statistical  significance  by  using  the  method  of  overl^ping 
ctxif  idence  limits  [Snedecor  and  Cochran,  1980,  p.  66] . 

Program  BNOOJFIW  provides  binomial  confidence  limits  quickly  and 
conveniently.  It  reads  data  observed  frcm  a  bincmial  sample  and  conputes 
the  proportion  (or  fraction)  of  the  observatioTS  that  are  in  the  category  of 
primary  interest.  It  then  conputes  several  sets  of  confidence  limits  and 
confidence  intervals  (confidence  ranges)  for  this  proporticn.  These  are 
cerputed  for  the  99%,  98%,  95%,  90%,  eind  80%  confidence  levels.  With  this 
informaticn,  the  user  can  easily  run  significance  tests  at  the  various  levels 
most  commonly  used  in  statistical  work. 

We  will  also  demenstrate  program  BNCWIFIA!  with  a  sanple  problem.  In  the 
re^xxnses  to  a  questionnaire  received  from  250  soldiers  at  Fort  IXincan  we 
find  that  33  soldiers  (13.2%)  show  a  negative  attitude  toward  the  quality  of 
health  care  at  the  post.  What  is  the  maximum  percent  of  the  total  soldier 
population  at  the  post  that  we  should  expect  to  show  this  negative  attitude 
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assuming  we  have  a  representative  saitpls?  Use  the  99%  cxaifidence  level. 

Ihe  user  runs  program  BNCXWLM.  He  enters  33  as  his  cxxint  of  observations  in 
the  category  of  primary  interest,  emd  250  as  the  saiiple  size  (nvmiber  of 
trials) .  He  obtains  the  following  output: 

BINOMIAL  CI»«TDENCE  UMITS 

COUNT  IN  CATEGORY  OF  INTEREST  =  33  SAMPLE  SIZE  =  250 

FRACnCN  0BSIP:VED  =  .13200 


99% 

LIMITS  = 

.08233 

AND 

. 19620 

RANGE  = 

.11387 

98% 

LIMITS  = 

.08641 

AND 

.18971 

RANGE  = 

.10329 

95% 

LIMITS  = 

.09263 

AND 

.18036 

RANGE  = 

.08772 

90% 

LIMITS  = 

.09820 

AND 

.17250 

RANGE  = 

.07430 

80% 

UMETS  = 

. 10487 

AND 

.16367 

RANGE  = 

.05881 

Note  that  the  fraction  observed  is  0.132,  and  the  99%  confidence  limits  are 
0.082  ard  0.196.  Wte  conclude  that  the  best  estimate  of  the  negative  attitude 
that  we  can  make  for  the  entire  population  (with  the  available  data)  is 
13.2%,  the  fraction  cteerved  from  the  sample.  Vfe  also  conclude  that  we  can 
be  99%  sure  that  the  true  population  attitude  is  contained  within  the  limits 
of  8.2%  and  19.6%.  Therefore,  the  maximum  negative  response  we  should  expect 
is  19.6%. 

THE  QUESnCW  OF  SAMPLE  SIZE 

Binomial  confidence  limits  can  be  used  to  obtain  some  insight  about  the 
effect  of  larger  sample  sizes  on  the  confidence  range  for  a  proportion  that 
is  being  estimated.  Assume  you  have  a  sample  of  data  where  N  =  10,  and  one 
of  the  10  observations  is  a  sucxess.  Running  this  through  program  BNOCXIFLM, 
you  would  obtain  a  confidence  range  of  0.44249  at  the  95%  confidence  level. 
Assume  that  this  cxnfidence  range  is  at  least  a  partial  indioator  of  the 
value  obtained  from  ycxir  sanple  of  10  observations.  You  would  assume  that  if 
ycu  spend  the  time  and  money  to  obtain  a  larger  sarple,  you  could  reduco  the 
size  of  the  confidence  range,  thereby  improving  your  knowlecige  and  enhancing 
the  value  of  ycxir  information. 

Assume  f\xrther  that  you  could  obtain  more  saiiple  observations  and  that  the 
c±6erved  prcporticxi  would  remain  at  0.10.  In  the  real  world  this  would  be 
most  unlikely,  but  the  purpose  of  this  exercise  is  to  demonstrate  what  wcxild 
happen  if  nothing  changed  exc^jt  the  sanple  size.  If  ycxi  increase  the  saitple 
size  by  adding  observations  in  increments  of  10,  a  summary  of  the  results 
would  appecu:  as  follows: 
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EFFECT  OF  SAMPLE  SIZE  CHANGES  UPON  95%  OCWFIDENCE  RANGES 


PRDPCRTION  OF  SUCCESSES  HELD  CONSTANT  AT  0.1 


SAMPLE 

SIZE 

CONFIDENCE 

RANGE 

PERCENT  REDUCTION 
FRCM  INITTAL  RANGE 

CHANGE  : 
PREVIOUS  ] 

10 

0.44249 

20 

0.30463 

31.2 

31.2 

30 

0.24417 

44.8 

13.6 

40 

0.20871 

52.8 

8.0 

50 

0.18486 

58.2 

5.4 

60 

0.16747 

62.2 

3.4 

70 

0.15409 

65.2 

3.0 

80 

0.14339 

67.6 

2.4 

90 

0.13460 

69.6 

2.0 

Note  that  when  you  add  a  second  increment  of  10  observatioTS  to  your  initial 
sairple  size,  the  ccaifidence  range  is  reduced  by  31.2%.  This  ^^)ears  to  be  a 
substantial  reduction.  The  confidence  limits  (not  shown)  have  moved  closer 
together.  You  can  new  make  a  stronger  statement  about  the  population 
preportion.  When  the  third  increment  of  10  observations  is  added,  you  obtain 
cinother  13.6%  reduction  -  still  very  nice.  As  you  continue  to  add  more 
increments  of  10,  however,  you  will  notice  that  you  are  buying  less  and  less 
additional  information  each  time.  The  same  phenomenon  exists  with  continuous 
data,  however,  the  rate  of  change  is  not  identical.  At  seme  point  you  will 
stop  cind  ask  yourself,  "Is  this  next  batch  of  data  actually  worth  the 
additional  time  and  money?".  This  is  a  practical  question  that  often  needs 
to  be  addressed. 

MORE  BACKGROUND  ON  THE  BINOMIAL 

We  previously  stated  that  the  bincxnial  distribution  is  synmetric  only  when 
P  =  0.5.  As  P  moves  away  from  0.5,  the  binomial  becomes  more  and  more  skewed 
teward  the  center  of  the  plot.  A  more  general  statement  [James  cind  James, 
1976,  p.  32]  is,  "When  N  is  large,  the  binomial  distributicxi  can  be 
approximated  by  a  normal  distribution  with  mecin  NP  and  variance  NPQ 
(where  Q  =  1  -  P) .  The  binomial  distribution  can  also  be  afproximabed  by 
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a  Poisson  distributicai  with  mean  NP  if  N  is  large". 

Let's  examine  this  statement  by  loolcing  at  a  plot  of  the  binomial  vhere 
P  =  0.01  and  N  =  210.  We  will  then  c^Dserve  the  inpact  of  holding  P  cxxistant 
vhile  N  is  ir>creased.  The  scales  will  be  held  constant  so  the  changes  to  the 
curve  will  not  be  distorted.  We  will  first  double  N,  then  increase  it  to 
five  times  its  original  value.  When  N  =  210  the  distribution  is  severely 
skewed  to  the  ri(^t  (Chart  #2) .  When  N  =  420  the  symmetry  is  much  inproved, 
cilthcu^  seme  skewness  is  still  ctivious  (Chart  #3) .  When  N  =  1050  the  curve 
appears  quite  synmetric  at  first  glance,  but  with  careful  observation  seme 
skewness  can  still  be  detected  (Chart  #4) . 

Because  of  facts  such  as  those  just  discussed,  statisticians  have  been  able 
to  derive  several  "shortcut"  approximaticxi  methods  for  determining  confidence 
limits  and  statisticcil  significance  that  are  widely  used  in  analyzing 
binomial  data.  Frequently  they  give  good  results,  but  occasicmally  they  give 
not-so-good  results.  The  danger  of  obtaining  bad  results  with  approximation 
methods  is  greatest  vhen  N  is  small  and  v^^en  the  probabilities  approach 
extreme  values  (either  zero  or  one) .  Statistical  tests  that  are  derived 
directly  from  the  binomial  distributican,  however,  are  EXACT  tests. 

Hew  can  we  make  the  preceding  statement?  The  statistical  literature  seldom 
uses  the  term  "EXACT  test".  However,  the  literature  abounds  with 
methodologies  with  names  such  as  Normal  Approximation  to  the  Binomial. 

Poisson  Acproximation  to  the  Bincanial.  and  Chi-seaxare  Approximation  to  the 
Binomial.  They  imply  that  the  binomial  is  a  basic  standard  without  stating 
it  directly  -  that  it  is  a  fundamental  mathematical  truth  so  self-evident 
that  the  statement  is  unnecessary.  To  back  up  our  EXACT  test  statement 
we  offer  the  two  following  arguments: 

1.  The  binomial  distribution  can  be  derived  from  the  basic  laws  of 
mathematical  probability  [Snedecor,  p.  107]. 

2.  The  binomial  distribution  exists  in  nature  -  in  the  real  world  -  as 
surely  as  any  law  of  physics.  One  need  only  identify  a  situation 
where  observations  fall  into  just  two  categories,  and  the 
probability  of  the  event  of  interest  occurring  during  any  one  trial 
is  constant.  This  can  be  represented  by  a  well  constructed  bead 
box  containing  beads  of  two  colors,  where  the  experimenter  draws 
randomly  with  r^lacement.  Or  by  an  honest  coin  with  honest 
flips.  This  may  not  sound  irtpressive,  until  one  thinks  about  the 
fact  that  many  of  the  commonly  used  probability  distributions 

do  not  exist  in  the  real  world.  Their  usefulness  is  in  the  fact 
that  they  give  an  excellent  approximation  of  many  things  that  do 
ocxour  in  the  real  world.  And  mathematicians  have  worked  with  them 
for  many  years  and  created  convenient  tables  that  will  answer 
almost  ciny  cxxnceivable  question  about  them  that  might  be  asked. 
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The  binomial  distributicai  was  first  discovered  by  the  Swiss  mathematician 
Jacques  Bernoulli  (1654-1705).  It  was  published  in  1713,  after  his  death. 
Seme  texts  refer  to  the  binomial  as  the  Bernoulli  distribution.  Again, 
statistical  tests  that  are  derived  directly  frem  this  distribution  are 
EXACT  tests. 

WHY  ISN'T  THE  BINOMIAL  MORE  WIDELY  USED  ? 

One  mi^t  aisk  the  questicxi,  "If  the  binomial  is  so  good,  vdiy  isn't  it  more 
widely  used?  Why  were  so  nany  short  cut  methods  developed?"  Ifrifortunately, 
the  binomicil  requires  a  different  table  for  each  probability  and  sanple 
size.  Most  general  purpose  statistical  texts  contain  caiLy  a  few.  Several 
bocks  of  binomial  tables  have  been  published.  One  of  these  is  called  TABLES 
OF  CUMULATIVE  BHOIIAL  PRCCABIUTTES,  (XOCiNCE  CORPS,  U.S.  AR4Y  ORDP20-1, 
1952.  However,  even  books  that  are  dedicated  to  binomial  tables  tend  to  not 
have  the  precise  table  that  you  really  need  for  your  practical  ^plication. 

Direct  confutation  of  individual  or  cumulative  probabilities  is  conc^jtually 
siitple.  However,  it  is  usually  inpracticcil  to  do  this  manually  unless  the 
sanple  size  is  small.  Large  sanple  sizes  require  extensive  CcdculaticsTS. 
Direct  ccnputaticxi  of  confidence  limits  for  a  probability  (or  a  proportion) 
that  is  estimated  from  a  large  sanple  requires  unbelievably  laborious 
calculaticxis.  As  previously  stated,  both  of  the  binomial  programs  were 
written  in  FORTRAN,  a  language  well  equipped  to  handle  these  ccnputaticxis. 

It  is  not  the  goal  of  this  paper  to  be  critical  of  the  normal  distribution, 
or  of  the  chi-square  or  the  Poisson.  These  distributions  all  have  inportant 
places  in  the  statistician's  tool  kit.  When  they  are  used  properly,  they 
generally  give  very  good  results.  But  when  you  deal  with  binomial  data,  vAiy 
not  go  back  to  the  basics  and  use  an  exact  test?  Personal  cxxtputers  are 
available  in  roost  offices.  The  software  is  very  "user  friendly".  And  the 
user  need  not  cenoem  himself  with  questions  such  as  whether  populations  are 
normally  distributed,  or  chi-square  expected  frequencies  are  too  small. 

HARDWARE  AND  SOFTWARE  SUMMARY: 

HARDWARE  REQUIRED:  IBM  PC  or  ccnpatible,  with  320  K  RAM 
SOFTWARE  REQUIRED:  MS-DOS  2.1  or  higher  (FORTRAN  cxitpiler  not  needed) 
BINOMIAL  TABLE  UMITAnCNS:  The  user  can  request  any  probability 
greater  than  0.0  and  less  than  1.0.  He  can  request  any  integer 
value  for  N  (number  of  tricLLs)  between  1  and  9,999,999. 

AVAHABILITY  OF  THE  PRCX3RAMS: 

The  programs  are  available  from  the  software  library  of  the 
Cemmand  and  Ocntrol  Micrcxxnputer  Ifeer's  Grxxp  (C2MUG) : 

Associate  Director,  MCS  CSE 
ATIN:  AMSEL-RD-SE-^1CS  (C2MUG) 

Building  138 

Fort  Leavenworth,  KS  66027-5600 
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The  programs  should  be  ordered  throui^  your  C2MUG  r^resentative. 

If  your  organization  does  not  have  a  representative,  you  can  call 
C2MLJG  at  DSN  552-755C  and  cisk  hew  to  order  software.  Ihe  binomial 
software  is  too  new  for  inclusion  in  the  1991  C2MUG  Software  Catalog, 
but  it  will  be  included  in  the  1992  catalog.  Individuals  ordering 
before  the  new  catalog  is  available  will  need  to  knew  the  catalog 
number.  It  is  500-009. 


QUESneWS: 

Questions,  ccranents,  and  suggestions  are  encouraged.  They  should 
be  directed  to: 

Cenmander,  USADC&S 

ATIN:  ATSL-CD-CS  (Mr.  Robert  Dick) 

Aberdeen  Proving  Ground,  MD  21005-5201 

Pham:  DSN  298-2028 

COM  301:278-2028 
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BINOMIAL  DISTRIBUTION 
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